NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / lang / c-part1 / 3624 < prev next >

Wrap

Text File | 1996-08-05 | 4.1 KB | 83 lines

Newsgroups: comp.lang.c Path: cwi.nl!dik From: dik@cwi.nl (Dik T. Winter) Subject: Re: Matrix Multiplication Message-ID: <DLyvL5.9uH@cwi.nl> Sender: news@cwi.nl (The Daily Dross) Nntp-Posting-Host: chrysant.cwi.nl Organization: CWI, Amsterdam References: <822792812snz@genesis.demon.co.uk> <DLwzv9.9wu@cwi.nl> <822923024snz@genesis.demon.co.uk> Date: Mon, 29 Jan 1996 23:51:05 GMT In article <822923024snz@genesis.demon.co.uk> fred@genesis.demon.co.uk writes: > Right. Since in my version c is only ever written to cache line reads > are not necessarily an issue. { k innermost in c[i][j] += a[i][k] * b[k][j] .} > are not necessarily an issue. It is also outside the innermost loop so > (in your case where N=1000) a cache miss compared to a 1000 iteration > loop (with 1000 potential cache misses for b) should be insignificant. > Indeed the chance of being able to accumulate the result in the inner loop > in a register is likely to be more significant. See below. > > >Timings on an SGI with a 100 MHz MIPS R4k (in seconds with N = 1000, 10 > >multiplications): This was an error in my posting while the array sizes where 1000x1000 the computations were performed on 100x100 sections (and this makes a difference). > >i innermost 11.08 > >k innermost 5.92 > >k innermost + scalar 4.60 > >j innermost 2.59 > [ try using temp for a[i][k] when j is the innermost loop.] > since that is an optimisation the compiler definitely can't make for itself > if it can't tell that a and c are distinct. I just tried it (and also a temp for b[k][j] when i is innermost and it makes absolutely no noticeable difference (as expected). While the compiler is not allowed to keep a[i][k] in a register, the value will be kept in cache so each load of a[i][k] is just a cache read (unless a and c overlap of course). > > Actually this makes it clear why your version works fast - *everything* > in the inner loop caches well. The only disadvantage is that the c update > requires an 'external' read-modify-write sequence. The performance of that > will depend on how write caching is implemented. There is no need for an 'external' read-modify-write sequence! When an element of c is updated it just writes out to the cache. When an element of a or b loads that for some reason overlaps with the c element the effect is just a cache hit. The cache makes aliasing transparent. It depends on the workings of the cache whether a write to the cache also results in a direct write to memory or whether that can be delayed until there is no bus activity (or required because of replacement strategy). It is important to keep in mind that with increasing processor speeds a cache miss may result in a load taking several cycles (I know of one processor where a secondary cache miss results in a load of 157 cycles, compared to a 2 cycle load on a primary cache hit). Using the j index innermost most likely gives best performance as it does best with trying to optimize the number of cache hits, and timing suggest that it is the most insensitive to variations of the spacing of elements. While in some cases the "k innermost + scalar" performs spectaculary well (and I have yet to analyze why) there are many cases that it performs much worse. > Maybe C could do as well as FORTRAN in this case. Oh, it can (and in many other cases as well). But it depends a bit on processor architecture. Caches help a lot to alleviate the aliasing problem, but caches ought to be large enough and on most processors they are just way too small. Also the cache replacement strategy implemented, the cache-line size and the set-associativity of the cache make great differences. But Fortran suffers as well here. And were it will not fly is on vector-processors, on those there is nothing to help with possible aliasing. (On the other hand, I just had a try on a Cray C-90 and I do not understand the timings there... I have to analyze a bit more I think.) -- dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924098 home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/